Draft assembly and annotation of the Cuban crocodile (Crocodylus rhombifer) genome

Objectives The new data provide an important genomic resource for the Critically Endangered Cuban crocodile (Crocodylus rhombifer). Cuban crocodiles are restricted to the Zapata Swamp in southern Matanzas Province, Cuba, and readily hybridize with the widespread American crocodile (Crocodylus acutus) in areas of sympatry. The reported de novo assembly will contribute to studies of crocodylian evolutionary history and provide a resource for informing Cuban crocodile conservation. Data description The final 2.2 Gb draft genome for C. rhombifer consists of 41,387 scaffolds (contigs: N50 = 104.67 Kb; scaffold: N50-518.55 Kb). Benchmarking Universal Single-Copy Orthologs (BUSCO) identified 92.3% of the 3,354 genes in the vertebrata_odb10 database. Approximately 42% of the genome (960Mbp) comprises repeat elements. We predicted 30,138 unique protein-coding sequences (17,737 unique genes) in the genome assembly. Functional annotation found the top Gene Ontology annotations for Biological Processes, Molecular Function, and Cellular Component were regulation, protein, and intracellular, respectively. This assembly will support future macroevolutionary, conservation, and molecular studies of the Cuban crocodile.

and land conversion continues to threaten this declining population.In addition, hybridization with the widespread American crocodile (Crocodylus acutus) in areas of sympatry may be an additional anthropogenic threat exacerbated by freshwater management and habitat modification activities [2].
A number of distinguishing morphological and behavioral traits have been described for this species [5,6].These include prominent cranial 'horns' , heavy-scaled and colorful skin, robust skull structures, adaptations for a more terrestrial lifestyle, and aggressive, intelligent hunting strategies [5,7].Previous phylogenetic and phylogenomic studies are ambiguous about the exact phylogenetic placement of C. rhombifer within the monophyletic Neotropical Crocodylus radiation [2,[8][9][10].Sequencing of whole genomes provides the best opportunity to test hypotheses concerning the biogeographic history and the evolution of novel morphological and behavioral traits.Such information may further offer insights into conservation threats and opportunities for this enigmatic species.Presented here is the first genome assembly for the Cuban crocodile.

Data description
For a detailed description of all methods see Table 1, Data file 1. High molecular weight DNA was extracted from a non-hybrid Cuban crocodile ( [2]; Table 1, Data file 2) using the QIAGEN® MagAttract HMW DNA Kit.10X Genomics Chromium Genome library preparation and sequencing was performed at the New York Genome Center.The libraries were 150 bp paired-end sequenced on an Illumina HiSeqX machine (1,717.59 million reads at ~ 65X coverage; mean read length of 138.5 bp; Table 1, Data file 3).

Limitations
The draft genome was generated using short-read shotgun sequencing via 10X genomics for a scale sample.As a result, the assembly is somewhat fragmented and smaller than the genome size estimate.The Cuban crocodile is naturally restricted to a developing country (Cuba) with limited research resources and access to sequencing technology.Consequently, obtaining genomic data from a non-hybrid wild caught specimen was limited to the most accessible sequencing technology available at the time of collection.If and when more funds become available, the completeness and accuracy of the genome will be built upon using long-read sequencing technologies.

Table 1
Overview of data files/data sets

Table 1 ,
Data file 1 Detailed description of the methodology Microsoft Word (.docx)

Table 1 ,
Data file 5 Interspersed Repeat landscape

Table 1 ,
Data file 6 Percentages of repeat elements